Artificial intelligence methods including deep neural networks (DNN) can provide rapid molecular classification of tumors from routine histology with accuracy that matches or exceeds human pathologists. Discerning how neural networks make their predictions remains a significant challenge, but explainability tools help provide insights into what models have learned when corresponding histologic features are poorly defined. Here, we present a method for improving explainability of DNN models using synthetic histology generated by a conditional generative adversarial network (cGAN). We show that cGANs generate high-quality synthetic histology images that can be leveraged for explaining DNN models trained to classify molecularly-subtyped tumors, exposing histologic features associated with molecular state. Fine-tuning synthetic histology through class and layer blending illustrates nuanced morphologic differences between tumor subtypes. Finally, we demonstrate the use of synthetic histology for augmenting pathologist-in-training education, showing that these intuitive visualizations can reinforce and improve understanding of histologic manifestations of tumor biology.
translated by 谷歌翻译
数据不平衡,其中多个数据样本来自一小部分标签,在训练深层神经网络方面构成了挑战。与分类不同,在回归中,标签是连续的,潜在的无限,并形成自然排序。回归的这些独特功能要求采用新技术,以利用标签空间关系中编码的其他信息。本文介绍了深度不平衡回归的Ranksim(排名相似性)正常化程序,该调节器编码一种感应偏置,该偏差在标签空间中更接近的样品在特征空间中也应该更接近。与最近基于分布平滑的方法相反,RankSIM捕获附近和遥远的关系:对于给定的数据样本,RankSIM鼓励其在标签空间中的邻居排序列表,以匹配其特征空间中邻居的排序列表。 Ranksim与常规不平衡的学习技术相辅相成,包括重新加权,两阶段培训和分配平滑,并在三个不平衡的回归基准上提高最先进的性能:IMDB-WIKI-DIR,年龄B-DIR,AgeDB-DIR,AgeDB-DIR,AGENDB-DIR,EAGENDB-DIR,,和STS-B-DIR。
translated by 谷歌翻译
Here, we demonstrate how machine learning enables the prediction of comonomers reactivity ratios based on the molecular structure of monomers. We combined multi-task learning, multi-inputs, and Graph Attention Network to build a model capable of predicting reactivity ratios based on the monomers chemical structures.
translated by 谷歌翻译
We present an extension to masked autoencoders (MAE) which improves on the representations learnt by the model by explicitly encouraging the learning of higher scene-level features. We do this by: (i) the introduction of a perceptual similarity term between generated and real images (ii) incorporating several techniques from the adversarial training literature including multi-scale training and adaptive discriminator augmentation. The combination of these results in not only better pixel reconstruction but also representations which appear to capture better higher-level details within images. More consequentially, we show how our method, Perceptual MAE, leads to better performance when used for downstream tasks outperforming previous methods. We achieve 78.1% top-1 accuracy linear probing on ImageNet-1K and up to 88.1% when fine-tuning, with similar results for other downstream tasks, all without use of additional pre-trained models or data.
translated by 谷歌翻译
The selection of an optimal pacing site, which is ideally scar-free and late activated, is critical to the response of cardiac resynchronization therapy (CRT). Despite the success of current approaches formulating the detection of such late mechanical activation (LMA) regions as a problem of activation time regression, their accuracy remains unsatisfactory, particularly in cases where myocardial scar exists. To address this issue, this paper introduces a multi-task deep learning framework that simultaneously estimates LMA amount and classify the scar-free LMA regions based on cine displacement encoding with stimulated echoes (DENSE) magnetic resonance imaging (MRI). With a newly introduced auxiliary LMA region classification sub-network, our proposed model shows more robustness to the complex pattern cause by myocardial scar, significantly eliminates their negative effects in LMA detection, and in turn improves the performance of scar classification. To evaluate the effectiveness of our method, we tests our model on real cardiac MR images and compare the predicted LMA with the state-of-the-art approaches. It shows that our approach achieves substantially increased accuracy. In addition, we employ the gradient-weighted class activation mapping (Grad-CAM) to visualize the feature maps learned by all methods. Experimental results suggest that our proposed model better recognizes the LMA region pattern.
translated by 谷歌翻译
Human behavior emerges from planning over elaborate decompositions of tasks into goals, subgoals, and low-level actions. How are these decompositions created and used? Here, we propose and evaluate a normative framework for task decomposition based on the simple idea that people decompose tasks to reduce the overall cost of planning while maintaining task performance. Analyzing 11,117 distinct graph-structured planning tasks, we find that our framework justifies several existing heuristics for task decomposition and makes predictions that can be distinguished from two alternative normative accounts. We report a behavioral study of task decomposition ($N=806$) that uses 30 randomly sampled graphs, a larger and more diverse set than that of any previous behavioral study on this topic. We find that human responses are more consistent with our framework for task decomposition than alternative normative accounts and are most consistent with a heuristic -- betweenness centrality -- that is justified by our approach. Taken together, our results provide new theoretical insight into the computational principles underlying the intelligent structuring of goal-directed behavior.
translated by 谷歌翻译
Prior work on ideology prediction has largely focused on single modalities, i.e., text or images. In this work, we introduce the task of multimodal ideology prediction, where a model predicts binary or five-point scale ideological leanings, given a text-image pair with political content. We first collect five new large-scale datasets with English documents and images along with their ideological leanings, covering news articles from a wide range of US mainstream media and social media posts from Reddit and Twitter. We conduct in-depth analyses of news articles and reveal differences in image content and usage across the political spectrum. Furthermore, we perform extensive experiments and ablation studies, demonstrating the effectiveness of targeted pretraining objectives on different model components. Our best-performing model, a late-fusion architecture pretrained with a triplet objective over multimodal content, outperforms the state-of-the-art text-only model by almost 4% and a strong multimodal baseline with no pretraining by over 3%.
translated by 谷歌翻译
语义分割是开发医学图像诊断系统的重要任务。但是,构建注释的医疗数据集很昂贵。因此,在这种情况下,半监督方法很重要。在半监督学习中,标签的质量在模型性能中起着至关重要的作用。在这项工作中,我们提出了一种新的伪标签策略,可提高用于培训学生网络的伪标签的质量。我们遵循多阶段的半监督训练方法,该方法在标记的数据集上训练教师模型,然后使用训练有素的老师将伪标签渲染用于学生培训。通过这样做,伪标签将被更新,并且随着培训的进度更加精确。上一个和我们的方法之间的关键区别在于,我们在学生培训过程中更新教师模型。因此,在学生培训过程中,提高了伪标签的质量。我们还提出了一种简单但有效的策略,以使用动量模型来提高伪标签的质量 - 训练过程中原始模型的慢复制版本。通过应用动量模型与学生培训期间的重新渲染伪标签相结合,我们在五个数据集中平均达到了84.1%的骰子分数(即Kvarsir,CVC-ClinicdB,Etis-laribpolypdb,cvc-colondb,cvc-colondb,cvc-colondb和cvc-300)和CVC-300)只有20%的数据集用作标记数据。我们的结果超过了3%的共同实践,甚至在某些数据集中取得了完全监督的结果。我们的源代码和预培训模型可在https://github.com/sun-asterisk-research/online学习SSL上找到
translated by 谷歌翻译
尽管两阶段矢量量化(VQ)生成模型允许合成高保真性和高分辨率图像,但其量化操作员将图像中的相似贴片编码为相同的索引,从而为相似的相邻区域重复使用现有的解码器体系结构的相似相似区域的重复伪像。为了解决这个问题,我们建议将空间条件的归一化结合起来,以调节量化的向量,以便将空间变体信息插入嵌入式索引图中,从而鼓励解码器生成更真实的图像。此外,我们使用多通道量化来增加离散代码的重组能力,而无需增加模型和代码簿的成本。此外,为了在第二阶段生成离散令牌,我们采用掩盖的生成图像变压器(MaskGit)来学习压缩潜在空间中的基础先验分布,该分布比常规自动回归模型快得多。两个基准数据集的实验表明,我们提出的调制VQGAN能够大大提高重建的图像质量,并提供高保真图像的产生。
translated by 谷歌翻译
套件是指准备和分组必要的零件和工具(或“套件”)以在制造环境中组装。自动化此过程可简化人工工人的组装任务,并提高效率。现有的自动化套件系统遵守脚本指示和预定义的启发式方法。但是,鉴于零件和逻辑延迟的可用性差异,现有系统的僵化性可以限制装配线的整体效率。在本文中,我们提出了一个双重优化框架,以使机器人能够执行基于任务分割的零件选择,套件布置和交付计划,以及时提供定制的套件 - 即在需要时正确。我们通过人类主题研究(n = 18)评估了提出的方法,涉及基于研究的数据构建平板家具桌和购物流仿真。我们的结果表明,与使用由任务图本身定义的刚性任务分割边界定义的基线方法相比,与基线方法相比,与基线方法相比,即将到来的套件系统更有效,对上游商店流量延迟有弹性,并且比较更好地优选。单个套件,包括组装单个单元所需的所有零件。
translated by 谷歌翻译